Skip to content

NETOBSERV-2675: flp-informers deployment and wiring#2650

Open
leandroberetta wants to merge 13 commits into
netobserv:mainfrom
leandroberetta:netobserv-1248
Open

NETOBSERV-2675: flp-informers deployment and wiring#2650
leandroberetta wants to merge 13 commits into
netobserv:mainfrom
leandroberetta:netobserv-1248

Conversation

@leandroberetta
Copy link
Copy Markdown
Contributor

@leandroberetta leandroberetta commented Apr 9, 2026

Description

Adds a dedicated flp-informers deployment (and RBAC) managed by the NetObserv operator, and wires flowlogs-pipeline processors to expose the k8scache gRPC endpoint so centralized Kubernetes informers can push cache updates to FLP.

apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
  processor:
    logLevel: info
    consumerReplicas: 3

    # NEW MODE: Centralized Kubernetes informers enabled
    # This reduces load on the Kubernetes API server by having a single
    # component (flp-informers) query the API instead of N FLP processors
    informers:
      enabled: true
      replicas: 2  # Minimum 2 for HA with leader election
      resources:
        requests:
          memory: 128Mi
          cpu: 50m
        limits:
          memory: 256Mi
          cpu: 200m
      advanced:
        resyncInterval: 60      # Seconds to rediscover processors
        batchSize: 100          # Max entries per update batch
        sendTimeout: 10         # Seconds timeout for sending updates
        updateBufferSize: 100   # Update channel buffer size
        processorPort: 9090     # gRPC port for k8s cache updates

Dependencies

netobserv/flowlogs-pipeline#1221

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

  • New Features

    • Added a centralized FLP Informers component with configurable replicas, resources and advanced tuning.
  • Configuration Changes

    • New CRD fields under spec.processor.informers; default FLP metrics port updated (9090 → 9401).
    • Validation rejects user-configured ports that conflict with the reserved k8scache port (9090).
  • Chores

    • Expanded RBAC to include coordination.k8s.io/leases permissions.
  • Tests

    • Added unit tests for port-conflict validation.

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Apr 9, 2026

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Adds a dedicated flp-informers deployment (and RBAC) managed by the NetObserv operator, and wires flowlogs-pipeline processors to expose the k8scache gRPC endpoint so centralized Kubernetes informers can push cache updates to FLP.

Dependencies

netobserv/flowlogs-pipeline#1221

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new FLP informers component: a dedicated reconciler and builder for flp-informers, reserves k8scache port (9090) with validation, moves informer RBAC/ownership out of monolith/transformer reconcilers, and extends API/CRD/helm/manifests, tests, and docs for informer configuration.

Changes

Cohort / File(s) Summary
Core FLP controller changes
internal/controller/flp/flp_controller.go, internal/controller/flp/flp_monolith_reconciler.go, internal/controller/flp/flp_transfo_reconciler.go
Adds informer reconciler to main loop; removes informer ClusterRoleBinding reconciliation from monolith/transformer reconcilers.
Informer implementation
internal/controller/flp/flp_informer_builder.go, internal/controller/flp/flp_informer_reconciler.go
New builder and reconciler implement flp-informers Deployment, ServiceAccount, RBAC, container args/ports/probes, defaults, and status handling.
Port reservation & builder validation
internal/controller/flp/flp_common_objects.go, internal/controller/flp/flp_monolith_objects.go, internal/controller/flp/flp_transfo_objects.go
Introduces k8scachePort = 9090, adds validatePortConflicts(...) to reject user ports that equal 9090, exposes k8scache port and appends --k8scache.* args.
API, CRD, webhook
api/flowcollector/v1beta2/flowcollector_types.go, api/flowcollector/v1beta2/flowcollector_validation_webhook.go, config/crd/bases/...flowcollectors.yaml, bundle/manifests/...flowcollectors.yaml, helm/crds/...flowcollectors.yaml
Adds spec.processor.informers types/schema, defaults and validation (enabled, replicas, resources, advanced tuning); webhook enforces replicas >= 2 when enabled.
Manifests, Helm, RBAC
bundle/manifests/netobserv-informers_rbac...clusterrole.yaml, bundle/manifests/netobserv-operator.clusterserviceversion.yaml, config/rbac/component_roles.yaml, config/rbac/role.yaml, helm/templates/clusterrole.yaml, helm/templates/netobserv-informers_rbac...clusterrole.yaml, helm/templates/netobserv-informers_rbac...clusterrolebinding.yaml, internal/pkg/manager/manager.go
Adds coordination.k8s.io/leases RBAC rules and new ClusterRoleBinding for flp-informers; updates CSV descriptors and operator RBAC annotation.
Status, tests, docs, samples
internal/pkg/manager/status/status_manager.go, internal/controller/flp/flp_test.go, internal/controller/flp/flp_controller_test.go, config/samples/flows_v1beta2_flowcollector.yaml, docs/FlowCollector.md
Adds FLPInformers status component constant; updates/extends tests for flp-informers Deployment/RBAC and port-conflict validation; adds sample and docs entries for informers config.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'NETOBSERV-2675: flp-informers deployment and wiring' accurately summarizes the main change: adding a dedicated flp-informers deployment and wiring it to FLP processors.
Description check ✅ Passed PR description includes a clear overview of changes, example config, dependencies link, and completed checklist items as required by template.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
internal/controller/flp/flp_controller_test.go (1)

173-174: Use informerName constant instead of string literal

Replace hardcoded "flp-informers" in expectations with informerName to avoid drift if naming changes.

Also applies to: 336-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_controller_test.go` around lines 173 - 174,
Replace the hardcoded string "flp-informers" in the test expectations with the
existing informerName constant so the assertions use informerName instead of a
literal; update the checks that reference rb3.Subjects[0].Name and any other
assertions (e.g., the ones at the other occurrence around lines noted) to
compare to informerName, leaving the RoleRef check (rb3.RoleRef.Name) as-is
unless it should also be replaced by a corresponding constant.
internal/controller/flp/flp_informer_builder.go (1)

60-69: Consider making informer resources tunable

Hardcoding informer resources may under-size large clusters. Consider deriving from config/CR defaults so limits can be adjusted without code changes.

As per coding guidelines "Monitor memory usage impact for large scale clusters - watch for memory growth with large flow volumes and optimize cacheMaxFlows, cacheActiveTimeout, writeBatchWait, writeBatchSize settings".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_builder.go` around lines 60 - 69, The
ResourceRequirements for the informer are hardcoded (corev1.ResourceRequirements
with fixed Requests/Limits); make these values configurable by reading from the
controller config or CR defaults instead of literals—e.g., add config/CR fields
(or reuse existing controller config) for informer.memoryRequest,
informer.cpuRequest, informer.memoryLimit, informer.cpuLimit, and use them when
constructing the ResourceRequirements in flp_informer_builder.go (fall back to
the current "128Mi"/"50m"/"256Mi"/"200m" defaults if unset); ensure the
builder/constructor that creates the informer (the code that sets
ResourceRequirements) accepts or fetches the config/CR and applies those values
so operators can tune
cacheMaxFlows/cacheActiveTimeout/writeBatchWait/writeBatchSize and informer
resource limits without code changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bundle.Dockerfile`:
- Line 10: The LABEL
"operators.operatorframework.io.metrics.builder=operator-sdk-v1.40.0" shows a
downgrade to v1.40.0; verify and make the bundle's operator-sdk version
consistent across files (e.g., update this LABEL or
bundle/metadata/annotations.yaml) so all bundle files reference the same
operator-sdk version (ensure the LABEL value and the entry in
bundle/metadata/annotations.yaml match the intended version).

In `@bundle/manifests/netobserv-operator.clusterserviceversion.yaml`:
- Line 99: The CSV contains the builder annotation
operators.operatorframework.io/builder set to operator-sdk-v1.40.0 which
downgrades from v1.42.0; confirm intended SDK version and update the annotation
value (operators.operatorframework.io/builder) to the correct version (e.g.,
operator-sdk-v1.42.0) to match the rest of the bundle and your intended build
tooling so all CSVs use the same operator-sdk version.

In `@bundle/metadata/annotations.yaml`:
- Line 9: The bundle annotations indicate operator-sdk version
operator-sdk-v1.40.0 but the Makefile uses v1.42.0; update the bundle generation
to match the Makefile or vice versa: regenerate the bundle artifacts
(annotations.yaml, clusterserviceversion.yaml, bundle.Dockerfile) using
operator-sdk v1.42.0 so operators.operatorframework.io.metrics.builder reads
operator-sdk-v1.42.0, or change the Makefile/operator-sdk reference back to
v1.40.0 to match the existing bundle—ensure the versions are identical across
the Makefile and the generated bundle files.

In `@internal/controller/flp/flp_common_objects.go`:
- Around line 131-135: The code hardcodes a "k8scache" ContainerPort at 9090
which can conflict with user-configured FLP/health/metrics/profile ports; add
validation before Pod spec/container args are generated to detect any
user-configured port equal to 9090 (or vice-versa) and return an error instead
of proceeding. Locate the code that builds the ports slice (where ports =
append(... Name: "k8scache", ContainerPort: 9090 ...)) and the routine that
composes process args for the FLP/health/metrics/profile settings, then
implement a check that scans those config fields for equality with 9090 and
rejects the Pod creation with a clear error message if a collision is found.
Ensure the validation runs early (before creating corev1.ContainerPort entries)
and covers the other hardcoded occurrence mentioned (lines ~174-178).

In `@internal/controller/flp/flp_informer_builder.go`:
- Around line 39-70: The informer Container construction (variable "container"
created with Name informerName and Image b.Images[reconcilers.MainImage]) lacks
a hardened SecurityContext; add SecurityContext:
&corev1.SecurityContext{RunAsNonRoot: pointer.BoolPtr(true), RunAsUser:
pointer.Int64Ptr(1000), AllowPrivilegeEscalation: pointer.BoolPtr(false),
ReadOnlyRootFilesystem: pointer.BoolPtr(true), Privileged:
pointer.BoolPtr(false), Capabilities: &corev1.Capabilities{Drop:
[]corev1.Capability{"ALL"}}, SeccompProfile: &corev1.SeccompProfile{Type:
corev1.SeccompProfileTypeRuntimeDefault}} to that container definition to match
other FLP workloads, and apply the same changes to the other informer Container
construction in this file (the second container block).

In `@internal/controller/flp/flp_informer_reconciler.go`:
- Around line 54-57: Multiple reconciliation steps in flp_informer_reconciler.go
return bare errors (e.g., returns from r.Managed.FetchAll, r.ensureConfigured,
r.Lister.GetParentByName, r.Managed.Create/Update/Delete) which loses context;
update each return to wrap the underlying error with contextual text using the
standard error-wrapping pattern (e.g., fmt.Errorf("step X failed: %w", err) or
errors.Wrap) inside the Reconcile method (and any helper methods like
ensureConfigured) so callers see which reconcile sub-step failed—replace each
bare "return err" with a descriptive wrapped error referencing the failing call
(e.g., "fetch all managed resources", "ensure configured", "get parent by name",
"create managed resource", "update managed resource", etc.).

---

Nitpick comments:
In `@internal/controller/flp/flp_controller_test.go`:
- Around line 173-174: Replace the hardcoded string "flp-informers" in the test
expectations with the existing informerName constant so the assertions use
informerName instead of a literal; update the checks that reference
rb3.Subjects[0].Name and any other assertions (e.g., the ones at the other
occurrence around lines noted) to compare to informerName, leaving the RoleRef
check (rb3.RoleRef.Name) as-is unless it should also be replaced by a
corresponding constant.

In `@internal/controller/flp/flp_informer_builder.go`:
- Around line 60-69: The ResourceRequirements for the informer are hardcoded
(corev1.ResourceRequirements with fixed Requests/Limits); make these values
configurable by reading from the controller config or CR defaults instead of
literals—e.g., add config/CR fields (or reuse existing controller config) for
informer.memoryRequest, informer.cpuRequest, informer.memoryLimit,
informer.cpuLimit, and use them when constructing the ResourceRequirements in
flp_informer_builder.go (fall back to the current "128Mi"/"50m"/"256Mi"/"200m"
defaults if unset); ensure the builder/constructor that creates the informer
(the code that sets ResourceRequirements) accepts or fetches the config/CR and
applies those values so operators can tune
cacheMaxFlows/cacheActiveTimeout/writeBatchWait/writeBatchSize and informer
resource limits without code changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 93c5b91f-3f2c-44e2-bed2-f73d938ec6ea

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad9ed9 and f676fa1.

📒 Files selected for processing (21)
  • bundle.Dockerfile
  • bundle/manifests/flows.netobserv.io_flowcollectors.yaml
  • bundle/manifests/flows.netobserv.io_flowcollectorslices.yaml
  • bundle/manifests/flows.netobserv.io_flowmetrics.yaml
  • bundle/manifests/netobserv-operator.clusterserviceversion.yaml
  • bundle/metadata/annotations.yaml
  • config/crd/bases/flows.netobserv.io_flowcollectors.yaml
  • config/crd/bases/flows.netobserv.io_flowcollectorslices.yaml
  • config/crd/bases/flows.netobserv.io_flowmetrics.yaml
  • helm/crds/flows.netobserv.io_flowcollectors.yaml
  • helm/crds/flows.netobserv.io_flowcollectorslices.yaml
  • helm/crds/flows.netobserv.io_flowmetrics.yaml
  • internal/controller/flp/flp_common_objects.go
  • internal/controller/flp/flp_controller.go
  • internal/controller/flp/flp_controller_test.go
  • internal/controller/flp/flp_informer_builder.go
  • internal/controller/flp/flp_informer_reconciler.go
  • internal/controller/flp/flp_monolith_objects.go
  • internal/controller/flp/flp_monolith_reconciler.go
  • internal/controller/flp/flp_transfo_reconciler.go
  • internal/pkg/manager/status/status_manager.go
💤 Files with no reviewable changes (2)
  • internal/controller/flp/flp_transfo_reconciler.go
  • internal/controller/flp/flp_monolith_reconciler.go

Comment thread bundle.Dockerfile Outdated
Comment thread bundle/manifests/netobserv-operator.clusterserviceversion.yaml Outdated
Comment thread bundle/metadata/annotations.yaml Outdated
Comment thread internal/controller/flp/flp_common_objects.go
Comment thread internal/controller/flp/flp_informer_builder.go
Comment thread internal/controller/flp/flp_informer_reconciler.go
@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Apr 9, 2026

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Adds a dedicated flp-informers deployment (and RBAC) managed by the NetObserv operator, and wires flowlogs-pipeline processors to expose the k8scache gRPC endpoint so centralized Kubernetes informers can push cache updates to FLP.

Dependencies

netobserv/flowlogs-pipeline#1221

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

  • Chores

  • Updated build/tool annotations (operator-sdk and controller-gen versions).

  • New Features

  • Added an FLP informer component: deployable informer workload with its own ServiceAccount, RBAC, Deployment, k8scache TCP port (9090) exposed, and new status component for informer health.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Apr 10, 2026

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Adds a dedicated flp-informers deployment (and RBAC) managed by the NetObserv operator, and wires flowlogs-pipeline processors to expose the k8scache gRPC endpoint so centralized Kubernetes informers can push cache updates to FLP.

Dependencies

netobserv/flowlogs-pipeline#1221

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

Release Notes

  • New Features

  • Introduced a new FLP Informers component that runs as a separate deployment for improved resource management.

  • Added port conflict validation to prevent configuration errors when user-selected ports conflict with the reserved k8scache port (9090).

  • Configuration Changes

  • Updated default metrics server port from 9090 to 9401 to avoid port conflicts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
internal/controller/flp/flp_informer_reconciler.go (1)

60-64: Consider handling TryDeleteAll errors.

TryDeleteAll return value is ignored. While "Try" implies best-effort, logging failures would aid debugging cleanup issues.

Suggested improvement
 	if desired.Spec.OnHold() {
 		r.Status.SetUnused("FlowCollector is on hold")
-		r.Managed.TryDeleteAll(ctx)
+		if err := r.Managed.TryDeleteAll(ctx); err != nil {
+			log.FromContext(ctx).Error(err, "failed to delete some managed resources")
+		}
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_reconciler.go` around lines 60 - 64,
When handling the OnHold branch, capture the error returned by
r.Managed.TryDeleteAll(ctx) instead of ignoring it; call it into a variable
(e.g., err := r.Managed.TryDeleteAll(ctx)) and if non-nil emit a diagnostic (use
the reconciler logger or status updater such as r.Log.Error(err, "failed to
cleanup managed resources on hold") or r.Status.SetError/SetDegraded with
context) so cleanup failures are recorded, then continue returning nil as
before.
internal/controller/flp/flp_informer_builder.go (3)

60-68: Hardcoded resource limits.

Resource requirements are fixed at 128Mi/256Mi memory and 50m/200m CPU. Other FLP components use desired.Processor.Resources from the CR spec. Consider exposing informer resources via config or document why fixed values are appropriate here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_builder.go` around lines 60 - 68, The
informer is using hardcoded resource values in the Resources block; replace the
fixed corev1.ResourceRequirements with the FLP spec's resource settings (use
desired.Processor.Resources) or thread a configurable field through the informer
builder so callers can supply resource requirements; update the builder function
that constructs the informer pod (look for Resources:
corev1.ResourceRequirements in flp_informer_builder.go) to read and apply
desired.Processor.Resources (or a new InformerResources field) and ensure
fallback/default values are documented or applied when the CR value is nil.

90-96: Pod template missing part-of label.

Deployment ObjectMeta.Labels includes "part-of": constants.OperatorName but the pod template labels don't. Other FLP builders (monolith, transformer) include this label in their pod templates for consistency.

Suggested fix
 			Template: corev1.PodTemplateSpec{
 				ObjectMeta: metav1.ObjectMeta{
 					Labels: map[string]string{
+						"part-of": constants.OperatorName,
 						"app":     informerName,
 						"version": version,
 					},
 				},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_builder.go` around lines 90 - 96, The
pod template under Template: corev1.PodTemplateSpec for the FLP informer builder
is missing the "part-of" label; update the ObjectMeta.Labels map inside the
PodTemplateSpec (the block that currently sets "app": informerName and
"version": version) to also include "part-of": constants.OperatorName so the pod
template matches the Deployment ObjectMeta.Labels and other FLP builders
(monolith, transformer).

29-104: Method signature returns error but never fails.

deployment() returns (*appsv1.Deployment, error) but always returns nil error. Either simplify the signature to return only *appsv1.Deployment, or add validation that could fail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_builder.go` around lines 29 - 104, The
deployment() method on informerBuilder currently returns (*appsv1.Deployment,
error) but never returns a non-nil error; either remove the unused error return
or add meaningful validation. To fix quickly, change informerBuilder.deployment
signature to return *appsv1.Deployment (no error), update its return to only the
Deployment value, and update all callers to accept the single return (search for
calls to deployment() and adjust unpacking). Alternatively, if you prefer
validation, add checks inside deployment (e.g., ensure
b.Images[reconcilers.MainImage] is non-empty and b.desired is non-nil) and
return an error when invalid instead of always nil; update callers accordingly
to handle the error.
internal/controller/flp/flp_monolith_objects.go (1)

183-188: Use constant and consistent type for k8scache port.

Line 185 hardcodes 9090 and line 187 uses intstr.FromInt() instead of intstr.FromInt32(). Use the k8scachePort constant for maintainability and match the int32 pattern used on line 181.

Suggested fix
 			{
 				Name:       "k8scache",
-				Port:       9090,
+				Port:       k8scachePort,
 				Protocol:   corev1.ProtocolTCP,
-				TargetPort: intstr.FromInt(9090),
+				TargetPort: intstr.FromInt32(k8scachePort),
 			},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_monolith_objects.go` around lines 183 - 188,
Replace the hardcoded 9090 and the intstr.FromInt call with the k8scachePort
constant and an int32 conversion so types are consistent: set Port to
k8scachePort and TargetPort to intstr.FromInt32(int32(k8scachePort)) for the
object with Name "k8scache" (the same pattern used on the earlier port
definition).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/controller/flp/flp_informer_builder.go`:
- Around line 60-68: The informer is using hardcoded resource values in the
Resources block; replace the fixed corev1.ResourceRequirements with the FLP
spec's resource settings (use desired.Processor.Resources) or thread a
configurable field through the informer builder so callers can supply resource
requirements; update the builder function that constructs the informer pod (look
for Resources: corev1.ResourceRequirements in flp_informer_builder.go) to read
and apply desired.Processor.Resources (or a new InformerResources field) and
ensure fallback/default values are documented or applied when the CR value is
nil.
- Around line 90-96: The pod template under Template: corev1.PodTemplateSpec for
the FLP informer builder is missing the "part-of" label; update the
ObjectMeta.Labels map inside the PodTemplateSpec (the block that currently sets
"app": informerName and "version": version) to also include "part-of":
constants.OperatorName so the pod template matches the Deployment
ObjectMeta.Labels and other FLP builders (monolith, transformer).
- Around line 29-104: The deployment() method on informerBuilder currently
returns (*appsv1.Deployment, error) but never returns a non-nil error; either
remove the unused error return or add meaningful validation. To fix quickly,
change informerBuilder.deployment signature to return *appsv1.Deployment (no
error), update its return to only the Deployment value, and update all callers
to accept the single return (search for calls to deployment() and adjust
unpacking). Alternatively, if you prefer validation, add checks inside
deployment (e.g., ensure b.Images[reconcilers.MainImage] is non-empty and
b.desired is non-nil) and return an error when invalid instead of always nil;
update callers accordingly to handle the error.

In `@internal/controller/flp/flp_informer_reconciler.go`:
- Around line 60-64: When handling the OnHold branch, capture the error returned
by r.Managed.TryDeleteAll(ctx) instead of ignoring it; call it into a variable
(e.g., err := r.Managed.TryDeleteAll(ctx)) and if non-nil emit a diagnostic (use
the reconciler logger or status updater such as r.Log.Error(err, "failed to
cleanup managed resources on hold") or r.Status.SetError/SetDegraded with
context) so cleanup failures are recorded, then continue returning nil as
before.

In `@internal/controller/flp/flp_monolith_objects.go`:
- Around line 183-188: Replace the hardcoded 9090 and the intstr.FromInt call
with the k8scachePort constant and an int32 conversion so types are consistent:
set Port to k8scachePort and TargetPort to intstr.FromInt32(int32(k8scachePort))
for the object with Name "k8scache" (the same pattern used on the earlier port
definition).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5085d614-83f7-410f-bbeb-7e640cf41e4c

📥 Commits

Reviewing files that changed from the base of the PR and between 05bd730 and 79953b0.

📒 Files selected for processing (6)
  • internal/controller/flp/flp_common_objects.go
  • internal/controller/flp/flp_informer_builder.go
  • internal/controller/flp/flp_informer_reconciler.go
  • internal/controller/flp/flp_monolith_objects.go
  • internal/controller/flp/flp_test.go
  • internal/controller/flp/flp_transfo_objects.go

Copy link
Copy Markdown
Member

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should make everything configurable and hide the options we don't want to expose much in the advanced config.

The number of replicas is mandatory here since the cache may fail and we'll loose the whole pipeline.

Also, the cache component should have health and report some metrics in the health dashboard.

Comment on lines +66 to +81
builder := newInformerBuilder(r.Instance, &desired.Spec)

// Reconcile ServiceAccount
if err := r.reconcileServiceAccount(ctx, &builder); err != nil {
return fmt.Errorf("failed to reconcile service account: %w", err)
}

// Reconcile RBAC
if err := r.reconcilePermissions(ctx); err != nil {
return fmt.Errorf("failed to reconcile permissions: %w", err)
}

// Reconcile Deployment
if err := r.reconcileDeployment(ctx, &builder); err != nil {
return fmt.Errorf("failed to reconcile deployment: %w", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should make this configurable. On a small cluster with few FLPs (maybe even one), it doesn't make sense to have cache separated. WDYT ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I added:

    # Centralized Kubernetes informers (enabled by default for reduced API server load)                  
    # informers:                                                                                         
    #   enabled: true                                                                                    
    #   replicas: 2  # Minimum 2 for HA with leader election

Comment thread internal/controller/flp/flp_common_objects.go
Command: []string{"/app/flp-informers"},
Args: []string{
fmt.Sprintf("--processor-selector=%s", processorSelector),
"--processor-port=9090",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rely on k8scachePort const

Comment thread internal/controller/flp/flp_monolith_objects.go Outdated
Name: "k8scache",
Port: 9090,
Protocol: corev1.ProtocolTCP,
TargetPort: intstr.FromInt(9090),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rely on k8scachePort const

Comment thread internal/controller/flp/flp_informer_builder.go Outdated
}

func (b *informerBuilder) deployment() (*appsv1.Deployment, error) {
var replicas int32 = 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about this. It is not resilient to have a single replica.

It would be interesting to make it configurable and have a leader election mechanism.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a leader election mechanism.

Comment on lines +60 to +69
Resources: corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceMemory: resource.MustParse("128Mi"),
corev1.ResourceCPU: resource.MustParse("50m"),
},
Limits: corev1.ResourceList{
corev1.ResourceMemory: resource.MustParse("256Mi"),
corev1.ResourceCPU: resource.MustParse("200m"),
},
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable in the advanced options

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now is configurable at:

    # Centralized Kubernetes informers (enabled by default for reduced API server load)                  
    # informers:                                                                                         
    #   enabled: true                                                                                    
    #   replicas: 2  # Minimum 2 for HA with leader election                                             
    #   resources:                                                                                       
    #     requests:
    #       memory: 128Mi                                                                                
    #       cpu: 50m                                        
    #     limits:
    #       memory: 256Mi
    #       cpu: 200m
    #   advanced:
    #     resyncInterval: 60      # Seconds to rediscover processors
    #     batchSize: 100          # Max entries per update batch                                         
    #     sendTimeout: 10         # Seconds timeout for sending updates
    #     updateBufferSize: 100   # Update channel buffer size

I can move it to the advance section if you want.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me ! thanks @leandroberetta

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jpinsonneau. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Apr 15, 2026

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Description

Adds a dedicated flp-informers deployment (and RBAC) managed by the NetObserv operator, and wires flowlogs-pipeline processors to expose the k8scache gRPC endpoint so centralized Kubernetes informers can push cache updates to FLP.

Dependencies

netobserv/flowlogs-pipeline#1221

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

  • New Features

  • Added a new FLP Informers component (centralized informers deployment) with configurable replicas and advanced tuning.

  • Configuration Changes

  • New CRD/config fields under spec.processor.informers; default processor metrics port changed from 9090 to 9401.

  • Added validation to prevent user ports conflicting with reserved k8scache port (9090).

  • Chores

  • Expanded RBAC to include coordination.k8s.io/leases permissions.

  • Tests

  • Added port-conflict validation unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml`:
- Around line 5552-5559: The YAML's `replicas` schema currently sets `minimum:
1` but the comment requires `replicas >= 2` when `enabled` is true; update the
CRD validation to enforce that conditional rule (or relax the comment). Add an
x-kubernetes-validations rule for the relevant `informers`/`flp-informers`
section that asserts "if .enabled == true then .replicas >= 2" (using the CRD
validation expression language), or alternatively change `minimum: 1` to
`minimum: 2` and adjust the description; target the `replicas` and `enabled`
fields in the same schema block so the validation is applied correctly.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml`:
- Around line 5121-5127: The schema for the "replicas" field currently sets
minimum: 1 while the description for the flp-informers deployment mandates a
minimum of 2; update either the validation or the description to remove the
conflict: either change the "replicas" property minimum from 1 to 2 to enforce
the described requirement, or modify the description to state that "a minimum of
2 replicas is recommended when `enabled` is `true`" (making it a recommendation
rather than a hard requirement); target the "replicas" field under the
flp-informers deployment definition so the change aligns schema validation with
the description.

In `@config/rbac/component_roles.yaml`:
- Around line 99-107: The informers ClusterRole is missing the "watch" verb for
the coordination.k8s.io/leases permission; update the informers ClusterRole
entry that lists apiGroups: coordination.k8s.io and resources: leases to include
"watch" alongside create, get, list, update so controller-runtime leader
election can observe lease changes (i.e., modify the informers ClusterRole's
leases verbs to add "watch").

In `@docs/FlowCollector.md`:
- Around line 10709-10715: Update the replicas description for the flp-informers
setting to remove the apparent contradiction: make the conditional explicit by
stating that when `enabled` is true a minimum of 2 replicas is
recommended/required for high availability, while the schema default/minimum
remains 1 otherwise; reference the `replicas` field and the `enabled` flag in
the text so readers understand the constraint applies only when `enabled=true`
and not to the base `<i>Minimum</i>: 1` schema value.

In `@internal/controller/flp/flp_informer_builder.go`:
- Around line 63-79: After reading config.Advanced.ProcessorPort into
processorPort, validate it against the informer’s reserved ports before
constructing the container/grpc port: ensure processorPort does not equal 8080
or 9091 (and any other ports used by the informer/process like k8scachePort),
and if it conflicts return an error (or reject the value) from the informer
builder path so the reconciler (flp_informer_reconciler.go) cannot create a pod
with conflicting ports; implement the check right after the assignment from
config.Advanced.ProcessorPort and before using processorPort to populate the
container port fields, referencing processorPort and
config.Advanced.ProcessorPort in the validation logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b8e2942-9cb9-4c51-9617-41a8afdfb570

📥 Commits

Reviewing files that changed from the base of the PR and between 79953b0 and 326ff55.

⛔ Files ignored due to path filters (1)
  • api/flowcollector/v1beta2/zz_generated.deepcopy.go is excluded by !**/zz_generated.deepcopy.go
📒 Files selected for processing (18)
  • api/flowcollector/v1beta2/flowcollector_types.go
  • api/flowcollector/v1beta2/flowcollector_validation_webhook.go
  • bundle/manifests/flows.netobserv.io_flowcollectors.yaml
  • bundle/manifests/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • bundle/manifests/netobserv-operator.clusterserviceversion.yaml
  • config/crd/bases/flows.netobserv.io_flowcollectors.yaml
  • config/rbac/component_roles.yaml
  • config/rbac/role.yaml
  • config/samples/flows_v1beta2_flowcollector.yaml
  • docs/FlowCollector.md
  • helm/crds/flows.netobserv.io_flowcollectors.yaml
  • helm/templates/clusterrole.yaml
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrolebinding.yaml
  • internal/controller/flp/flp_informer_builder.go
  • internal/controller/flp/flp_informer_reconciler.go
  • internal/controller/flp/flp_monolith_objects.go
  • internal/pkg/manager/manager.go
✅ Files skipped from review due to trivial changes (6)
  • config/rbac/role.yaml
  • internal/pkg/manager/manager.go
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • config/samples/flows_v1beta2_flowcollector.yaml
  • bundle/manifests/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrolebinding.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/flp/flp_informer_reconciler.go

Comment on lines +5552 to +5559
replicas:
default: 2
description: |-
`replicas` defines the number of replicas for the flp-informers deployment.
For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
format: int32
minimum: 1
type: integer
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Enforce the HA replica rule in schema (or relax the wording).

Line 5556 says a minimum of 2 replicas is required when enabled, but the schema only enforces minimum: 1 (Line 5558) with no conditional rule. enabled: true + replicas: 1 would still validate.

Suggested CRD fix
                   informers:
@@
                     properties:
@@
                       replicas:
                         default: 2
                         description: |-
                           `replicas` defines the number of replicas for the flp-informers deployment.
                           For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
                         format: int32
                         minimum: 1
                         type: integer
@@
-                    type: object
+                    type: object
+                    x-kubernetes-validations:
+                    - message: replicas must be at least 2 when informers are enabled
+                      rule: self.enabled != true || !has(self.replicas) || self.replicas >= 2
#!/bin/bash
set -euo pipefail

# Verify whether webhook already enforces "enabled => replicas >= 2"
fd -i 'flowcollector_validation_webhook.go' -x sh -c '
  echo "==> $1"
  rg -n "informers|replicas|enabled|at least 2|>=\\s*2" "$1"
' sh {}

# Verify current CRD validation presence for informers replicas
fd -i 'flows.netobserv.io_flowcollectors.yaml' -x sh -c '
  echo "==> $1"
  rg -n "informers:|replicas:|x-kubernetes-validations|enabled:" "$1"
' sh {}

Also applies to: 5628-5628

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml` around lines 5552 -
5559, The YAML's `replicas` schema currently sets `minimum: 1` but the comment
requires `replicas >= 2` when `enabled` is true; update the CRD validation to
enforce that conditional rule (or relax the comment). Add an
x-kubernetes-validations rule for the relevant `informers`/`flp-informers`
section that asserts "if .enabled == true then .replicas >= 2" (using the CRD
validation expression language), or alternatively change `minimum: 1` to
`minimum: 2` and adjust the description; target the `replicas` and `enabled`
fields in the same schema block so the validation is applied correctly.

Comment on lines +5121 to +5127
default: 2
description: |-
`replicas` defines the number of replicas for the flp-informers deployment.
For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
format: int32
minimum: 1
type: integer
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align replicas validation with its description.

The description says a minimum of 2 replicas is required when enabled, but Line 5126 allows minimum: 1. Either enforce 2 in schema/webhook or reword this as a recommendation to avoid conflicting API behavior.

Suggested wording fix (if 1 replica is intentionally allowed)
-                            For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
+                            For high availability, 2 or more replicas are recommended when `enabled` is `true`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml` around lines 5121 -
5127, The schema for the "replicas" field currently sets minimum: 1 while the
description for the flp-informers deployment mandates a minimum of 2; update
either the validation or the description to remove the conflict: either change
the "replicas" property minimum from 1 to 2 to enforce the described
requirement, or modify the description to state that "a minimum of 2 replicas is
recommended when `enabled` is `true`" (making it a recommendation rather than a
hard requirement); target the "replicas" field under the flp-informers
deployment definition so the change aligns schema validation with the
description.

Comment thread config/rbac/component_roles.yaml
Comment thread docs/FlowCollector.md
Comment on lines +63 to +79
processorPort := int32(k8scachePort)
if config.Advanced != nil {
if config.Advanced.ResyncInterval != nil {
resyncInterval = *config.Advanced.ResyncInterval
}
if config.Advanced.BatchSize != nil {
batchSize = *config.Advanced.BatchSize
}
if config.Advanced.SendTimeout != nil {
sendTimeout = *config.Advanced.SendTimeout
}
if config.Advanced.UpdateBufferSize != nil {
updateBufferSize = *config.Advanced.UpdateBufferSize
}
if config.Advanced.ProcessorPort != nil {
processorPort = *config.Advanced.ProcessorPort
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add informer-side validation for processorPort conflicts.

processorPort is user-overridable (Line 77-79) and then reused as container grpc port (Line 105), while Line 110 and Line 115 already reserve 8080/9091. The informer reconciliation path (internal/controller/flp/flp_informer_reconciler.go, Line 53-99) does not call shared port-conflict validation, so conflicting values can slip through.

Proposed patch
@@
 	if config.Advanced != nil {
@@
 		if config.Advanced.ProcessorPort != nil {
 			processorPort = *config.Advanced.ProcessorPort
 		}
 	}
+
+	// Reserved informer container ports
+	const (
+		healthPort  int32 = 8080
+		metricsPort int32 = 9091
+	)
+	if processorPort == healthPort || processorPort == metricsPort {
+		return nil, fmt.Errorf("invalid processor port %d: conflicts with reserved informer ports (%d health, %d metrics)", processorPort, healthPort, metricsPort)
+	}
@@
 		{
 			Name:          "health",
-			ContainerPort: 8080,
+			ContainerPort: healthPort,
 			Protocol:      corev1.ProtocolTCP,
 		},
 		{
 			Name:          "metrics",
-			ContainerPort: 9091,
+			ContainerPort: metricsPort,
 			Protocol:      corev1.ProtocolTCP,
 		},
@@
 				Path: "/healthz",
-				Port: intstr.FromInt(8080),
+				Port: intstr.FromInt(int(healthPort)),
@@
 				Path: "/ready",
-				Port: intstr.FromInt(8080),
+				Port: intstr.FromInt(int(healthPort)),

Also applies to: 101-118

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_builder.go` around lines 63 - 79, After
reading config.Advanced.ProcessorPort into processorPort, validate it against
the informer’s reserved ports before constructing the container/grpc port:
ensure processorPort does not equal 8080 or 9091 (and any other ports used by
the informer/process like k8scachePort), and if it conflicts return an error (or
reject the value) from the informer builder path so the reconciler
(flp_informer_reconciler.go) cannot create a pod with conflicting ports;
implement the check right after the assignment from
config.Advanced.ProcessorPort and before using processorPort to populate the
container port fields, referencing processorPort and
config.Advanced.ProcessorPort in the validation logic.

jpinsonneau
jpinsonneau previously approved these changes Apr 15, 2026
Copy link
Copy Markdown
Member

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback. That looks good to me now

Let's just try to fix the tests

Metric netobserv_ingest_flows_processed not found after 180 secs!!!

@openshift-ci openshift-ci Bot added the lgtm label Apr 15, 2026
@jpinsonneau
Copy link
Copy Markdown
Member

/test e2e-operator

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 16, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Apr 16, 2026

@leandroberetta: This pull request references NETOBSERV-2675 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Description

Adds a dedicated flp-informers deployment (and RBAC) managed by the NetObserv operator, and wires flowlogs-pipeline processors to expose the k8scache gRPC endpoint so centralized Kubernetes informers can push cache updates to FLP.

Dependencies

netobserv/flowlogs-pipeline#1221

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

  • New Features

  • Added a centralized FLP Informers component with configurable replicas, resources and advanced tuning.

  • Configuration Changes

  • New CRD fields under spec.processor.informers; default FLP metrics port updated (9090 → 9401).

  • Validation rejects user-configured ports that conflict with the reserved k8scache port (9090).

  • Chores

  • Expanded RBAC to include coordination.k8s.io/leases permissions.

  • Tests

  • Added unit tests for port-conflict validation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/flp/flp_common_objects.go (1)

160-207: ⚠️ Potential issue | 🟡 Minor

k8scache port & args are added unconditionally — gate on informers.enabled.

The k8scache container port (Line 160-164) and the --k8scache.* args (Line 203-207) are appended regardless of whether spec.processor.informers.enabled is true. When the feature is off, you still bind 9090 and pass flags the user didn't opt into. This also means FLP must always understand the --k8scache.* flags, tightening the version coupling with flowlogs-pipeline#1221.

🔧 Suggested gating
-	ports = append(ports, corev1.ContainerPort{
-		Name:          "k8scache",
-		ContainerPort: k8scachePort,
-		Protocol:      corev1.ProtocolTCP,
-	})
+	informersEnabled := desired.Processor.Informers != nil &&
+		(desired.Processor.Informers.Enabled == nil || *desired.Processor.Informers.Enabled)
+	if informersEnabled {
+		ports = append(ports, corev1.ContainerPort{
+			Name:          "k8scache",
+			ContainerPort: k8scachePort,
+			Protocol:      corev1.ProtocolTCP,
+		})
+	}
@@
-		Args: []string{
-			fmt.Sprintf(`--config=%s/%s`, configPath, configFile),
-			fmt.Sprintf("--k8scache.port=%d", k8scachePort),
-			"--k8scache.address=0.0.0.0",
-		},
+		Args: func() []string {
+			a := []string{fmt.Sprintf(`--config=%s/%s`, configPath, configFile)}
+			if informersEnabled {
+				a = append(a, fmt.Sprintf("--k8scache.port=%d", k8scachePort), "--k8scache.address=0.0.0.0")
+			}
+			return a
+		}(),

Likewise, validatePortConflicts only needs to fire when informers are enabled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_common_objects.go` around lines 160 - 207, The
k8scache ContainerPort and the "--k8scache.*" args are being added
unconditionally; change the logic to only append the k8scache port (the
corev1.ContainerPort with Name "k8scache") and the
"--k8scache.port/--k8scache.address" entries in the container.Args for the
constants.FLPName container when the informers feature is enabled (i.e., gate on
the informers.enabled flag in advancedConfig or the equivalent field). Also
ensure any call to validatePortConflicts is only invoked when informers are
enabled so we don’t validate 9090 unless the informers are on. Locate these
changes around where ports are appended (k8scache port block), where
container.Args are built (k8scache args), and where validatePortConflicts is
called and wrap them in the same informers-enabled conditional.
♻️ Duplicate comments (1)
config/crd/bases/flows.netobserv.io_flowcollectors.yaml (1)

5150-5157: ⚠️ Potential issue | 🟡 Minor

replicas validation contradicts the documented requirement.

minimum: 1 conflicts with the field description saying 2 is required when enabled=true. Please either enforce the conditional constraint or reword it as a recommendation.

Suggested schema-side enforcement
                     informers:
                       description: |-
                         `informers` configuration for centralized Kubernetes informers that push cache updates to flowlogs-pipeline processors.
                         This reduces load on the Kubernetes API server by having a single component (flp-informers) query the API instead of N FLP processors.
                         When enabled, a dedicated `flp-informers` deployment is created that watches Kubernetes resources and pushes updates via gRPC.
                       properties:
@@
                         replicas:
                           default: 2
                           description: |-
                             `replicas` defines the number of replicas for the flp-informers deployment.
                             For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
                           format: int32
                           minimum: 1
                           type: integer
@@
+                      x-kubernetes-validations:
+                        - message: "informers.replicas must be >= 2 when informers.enabled is true."
+                          rule: "!has(self.enabled) || !self.enabled || !has(self.replicas) || self.replicas >= 2"
                       type: object
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml` around lines 5150 -
5157, The replicas schema for the flp-informers deployment currently declares
minimum: 1 but the description mandates "a minimum of 2 replicas when `enabled`
is `true` — fix this by either updating the schema to enforce the constraint or
rewording the description: either change the `replicas` validation so minimum: 2
and/or add an if/then conditional that checks the `enabled` boolean and requires
replicas >= 2 when `enabled: true`, or keep minimum: 1 but rephrase the
description to a recommendation (e.g., "recommended minimum of 2 when `enabled`
is true"); locate the `replicas` field and the related `enabled` flag in the
flowcollectors CRD to apply the chosen fix.
🧹 Nitpick comments (8)
internal/controller/flp/flp_common_objects.go (3)

98-124: Extract the port-name literal and simplify the repetition.

Four near-identical branches with the same error template. A small table collapses this nicely and makes future additions (e.g., another reserved port) one-line changes:

♻️ Suggested refactor
-func validatePortConflicts(desired *flowslatest.FlowCollectorSpec) error {
-	advancedConfig := helper.GetAdvancedProcessorConfig(desired)
-
-	// Check FLP port
-	if advancedConfig.Port != nil && *advancedConfig.Port == k8scachePort {
-		return fmt.Errorf("flowlogs-pipeline port %d conflicts with reserved k8scache port %d", *advancedConfig.Port, k8scachePort)
-	}
-
-	// Check health port
-	if advancedConfig.HealthPort != nil && *advancedConfig.HealthPort == k8scachePort {
-		return fmt.Errorf("flowlogs-pipeline health port %d conflicts with reserved k8scache port %d", *advancedConfig.HealthPort, k8scachePort)
-	}
-
-	// Check metrics port
-	metricsPort := desired.Processor.GetMetricsPort()
-	if metricsPort == k8scachePort {
-		return fmt.Errorf("flowlogs-pipeline metrics port %d conflicts with reserved k8scache port %d", metricsPort, k8scachePort)
-	}
-
-	// Check profile port (optional)
-	if advancedConfig.ProfilePort != nil && *advancedConfig.ProfilePort == k8scachePort {
-		return fmt.Errorf("flowlogs-pipeline profile port %d conflicts with reserved k8scache port %d", *advancedConfig.ProfilePort, k8scachePort)
-	}
-
-	return nil
-}
+func validatePortConflicts(desired *flowslatest.FlowCollectorSpec) error {
+	adv := helper.GetAdvancedProcessorConfig(desired)
+	checks := []struct {
+		label string
+		port  *int32
+	}{
+		{"port", adv.Port},
+		{"health port", adv.HealthPort},
+		{"metrics port", ptr.To(desired.Processor.GetMetricsPort())},
+		{"profile port", adv.ProfilePort},
+	}
+	for _, c := range checks {
+		if c.port != nil && *c.port == k8scachePort {
+			return fmt.Errorf("flowlogs-pipeline %s %d conflicts with reserved k8scache port %d", c.label, *c.port, k8scachePort)
+		}
+	}
+	return nil
+}

Also — this is a spec validation and arguably belongs next to validateFLP() in the webhook, not inside the builder. Builders today silently swallow the error in tests (b, _ := newMonolithBuilder(...)), so a misconfig would surface as a broken controller reconcile rather than a clean webhook rejection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_common_objects.go` around lines 98 - 124, The
validatePortConflicts function repeats the same error template for multiple
ports; refactor it by extracting the error message template and building a small
map/slice of (portName, portValue) entries using advancedConfig.Port,
advancedConfig.HealthPort, desired.Processor.GetMetricsPort(), and
advancedConfig.ProfilePort, then iterate and return fmt.Errorf(fmtTemplate, val,
k8scachePort) when a conflict is found; update references to the unique symbols
validatePortConflicts, advancedConfig, desired.Processor.GetMetricsPort, and
k8scachePort to implement the loop-based check and remove duplicated branches —
also consider moving this spec validation into the webhook near validateFLP so
misconfigs are rejected early instead of being swallowed by builders.

161-161: Promote "k8scache" to a named const.

healthPortName, prometheusPortName, profilePortName all have constants above — this one sticks out as a bare string literal. Add k8scachePortName = "k8scache" alongside k8scachePort for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_common_objects.go` at line 161, Promote the
literal "k8scache" to a named constant by adding k8scachePortName = "k8scache"
alongside the existing k8scachePort constant, and replace the literal Name:
"k8scache" with Name: k8scachePortName (or use the constant wherever that port
name is referenced). Update references to the string in the same file so usage
is consistent with healthPortName/prometheusPortName/profilePortName and the
existing k8scachePort symbol.

30-30: Consider exposing k8scachePort as configurable (e.g., under Advanced).

Hardcoding a well-known port like 9090 invites collisions with other sidecars or injected tooling (kube-rbac-proxy, Prometheus node-exporter style setups, etc.). Promoting this to an advanced-config field keeps the reserved default but gives operators an escape hatch — which also neatly avoids the whole validatePortConflicts dance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_common_objects.go` at line 30, Replace the
hardcoded k8scachePort = 9090 with a configurable Advanced option: add a field
(e.g., Advanced.K8sCachePort int) to the existing configuration struct and
supply a default of 9090 when loading/constructing the config, then replace uses
of the package-level k8scachePort symbol with reads from that config (falling
back to 9090 if unset); also update any config parsing/CLI flags and
documentation to expose the new Advanced option and keep existing validation (or
simplify validatePortConflicts if desired).
internal/controller/flp/flp_test.go (1)

814-850: Good coverage — consider one small extension.

validatePortConflicts is invoked from both newMonolithBuilder and newTransfoBuilder, but only the monolith path is exercised here. A single mirror case via newTransfoBuilder (or parameterizing the constructor) would guard against someone accidentally dropping the call from one constructor.

Also an ideal candidate for table-driven tests — the five cases repeat the same three-line pattern:

♻️ Table-driven sketch
cases := []struct {
    name   string
    mutate func(*flowslatest.FlowCollectorSpec)
    errSub string
}{
    {"flp port", func(c *flowslatest.FlowCollectorSpec) { c.Processor.Advanced.Port = ptr.To(int32(9090)) }, "flowlogs-pipeline port 9090"},
    {"health port", func(c *flowslatest.FlowCollectorSpec) { c.Processor.Advanced.HealthPort = ptr.To(int32(9090)) }, "health port 9090"},
    {"metrics port", func(c *flowslatest.FlowCollectorSpec) { c.Processor.Metrics.Server.Port = ptr.To(int32(9090)) }, "metrics port 9090"},
    {"profile port", func(c *flowslatest.FlowCollectorSpec) { c.Processor.Advanced.ProfilePort = ptr.To(int32(9090)) }, "profile port 9090"},
    {"ok", func(*flowslatest.FlowCollectorSpec) {}, ""},
}
for _, tc := range cases {
    t.Run(tc.name, func(t *testing.T) { /* ... */ })
}

Based on learnings, testify is an approved framework in this repo, so no need to port to Ginkgo/Gomega just for this.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_test.go` around lines 814 - 850,
TestPortConflictValidation only exercises validatePortConflicts via
newMonolithBuilder; add a mirror set of assertions that call newTransfoBuilder
to ensure validatePortConflicts is enforced there too (or refactor
TestPortConflictValidation into a table-driven loop that runs each test case
against both newMonolithBuilder and newTransfoBuilder). Keep the same mutation
functions and error substring checks (use the existing cases for
Processor.Advanced.Port, HealthPort, Metrics.Server.Port, ProfilePort and the
no-conflict case) and assert the same error messages when invoking
newTransfoBuilder (or both builders) so the coverage is identical for
validatePortConflicts, referencing the existing TestPortConflictValidation,
newMonolithBuilder, newTransfoBuilder and validatePortConflicts symbols.
bundle/manifests/netobserv-operator.clusterserviceversion.yaml (1)

469-486: Missing UI descriptors for consistency.

Other toggles/advanced sections in this CSV use booleanSwitch for enabled flags and hidden/advanced markers for advanced subsections (e.g., consolePlugin.enable, processor.advanced). Consider aligning processor.informers.enabled with booleanSwitch and marking processor.informers.advanced fields with advanced (and a fieldDependency:processor.informers.enabled:true) so the OpenShift form matches existing patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bundle/manifests/netobserv-operator.clusterserviceversion.yaml` around lines
469 - 486, The CSV UI descriptors should mirror existing patterns: change the
descriptor for processor.informers.enabled to use booleanSwitch (like
consolePlugin.enable) and mark the processor.informers.advanced section and its
child fields (processor.informers.advanced.batchSize,
processor.informers.advanced.processorPort,
processor.informers.advanced.resyncInterval,
processor.informers.advanced.sendTimeout,
processor.informers.advanced.updateBufferSize) as advanced/hidden; add a
fieldDependency on processor.informers.enabled:true so the advanced fields only
appear when the Informers toggle is enabled and the OpenShift form behavior
matches other components.
internal/controller/flp/flp_controller_test.go (1)

168-174: Consider asserting informer Deployment/SA lifecycle across Kafka transitions.

The Kafka context verifies the shared informer CRB but doesn't assert the flp-informers Deployment/ServiceAccount persist across the mono→Kafka→mono transitions (Contexts "With Kafka" and "Back without Kafka"). Since this is now a shared, independent deployment, a quick Eventually ... Get(flpKeyInformer, &Deployment{}).Should(Succeed()) in both contexts would guard against accidental teardown on model switch.

Also applies to: 330-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_controller_test.go` around lines 168 - 174, Add
assertions in the "With Kafka" and "Back without Kafka" test contexts to verify
the flp-informers Deployment and ServiceAccount persist across the
mono→Kafka→mono transitions: use Eventually with k8sClient.Get and the existing
flpKeyInformer key to ensure Get(ctx, flpKeyInformer, &appsv1.Deployment{}) and
Get(ctx, flpKeyInformer, &corev1.ServiceAccount{}) both succeed (similar to the
existing ClusterRoleBinding check using rb3), so the shared informer
Deployment/SA is not accidentally torn down when switching models.
internal/controller/flp/flp_informer_reconciler.go (2)

66-70: Nit: collapse the nil-check with ptr.Deref.

Minor readability tweak — no behavior change.

Proposed diff
-	// Check if informers are enabled (default: true)
-	enabled := true
-	if desired.Spec.Processor.Informers != nil && desired.Spec.Processor.Informers.Enabled != nil {
-		enabled = *desired.Spec.Processor.Informers.Enabled
-	}
-
-	if !enabled {
+	// Check if informers are enabled (default: true)
+	if desired.Spec.Processor.Informers != nil && !ptr.Deref(desired.Spec.Processor.Informers.Enabled, true) {

Requires k8s.io/utils/ptr import. Skip if the project prefers the explicit form.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_reconciler.go` around lines 66 - 70,
Replace the manual nil-check that initializes the local variable enabled from
desired.Spec.Processor.Informers.Enabled with a single ptr.Deref call: set
enabled := ptr.Deref(desired.Spec.Processor.Informers.Enabled, true) (or the
equivalent expression) so the default true is used when the pointer is nil; add
the import "k8s.io/utils/ptr". Update the block that references
desired.Spec.Processor.Informers.Enabled and remove the explicit if nil { ... }
branch.

101-108: SA drift won't be corrected — confirm this is acceptable.

reconcileServiceAccount only creates when missing. If labels/annotations/imagePullSecrets on the SA ever diverge from the builder (e.g., user edit, future builder changes), it won't self-heal until the SA is deleted. The inline comment acknowledges it, so flagging as FYI in case the informers SA gains secrets/annotations later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_informer_reconciler.go` around lines 101 - 108,
reconcileServiceAccount in informerReconciler currently only creates the
ServiceAccount via builder.serviceAccount() when
r.Managed.Exists(r.serviceAccount) is false, so
label/annotation/imagePullSecrets drift won't be corrected; update
reconcileServiceAccount to fetch the existing SA, compare labels, annotations
and imagePullSecrets with builder.serviceAccount() and perform a patch/update
(using r.Update or a strategic merge/patch method) when differences exist while
preserving managed fields, or explicitly document/guard that this resource is
intentionally immutable; reference reconcileServiceAccount, informerReconciler,
builder.serviceAccount(), r.Managed.Exists and r.CreateOwned when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/flowcollector/v1beta2/flowcollector_validation_webhook.go`:
- Around line 460-481: In validateInformers(), treat a nil Enabled pointer as
true (respect the kubebuilder default) by changing the enabled calculation to
consider nil as enabled instead of disabled (refer to
v.fc.Processor.Informers.Enabled and the validateInformers function); keep the
replicas >=2 enforcement when enabled but update the error message or coordinate
with the CRD (Replicas' +kubebuilder:validation:Minimum) so documentation/CRD
marker and the webhook behavior are aligned (either relax webhook to match CRD
or change CRD/marker to indicate minimum 2).

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml`:
- Around line 5154-5161: The description for the replicas field claims "a
minimum of 2 replicas is required" but the schema still allows minimum: 1;
update the CRD by either changing the schema to enforce minimum: 2 (replace
"minimum: 1" with "minimum: 2" for the replicas property) or, if you don't want
to enforce it, reword the description to say "recommended for HA" instead of
"required"; locate the replicas block (contains "replicas:", "default: 2", and
the description) and apply the chosen change so description and validation are
consistent.
- Around line 5110-5230: The CRD and status population are missing the informers
component; add an Informers *FlowCollectorComponentStatus field to the
FlowCollectorComponentsStatus struct and update populateComponentStatuses() to
handle status.FLPInformers (mirror the existing cases for
Agent/Processor/Plugin) so that when the controller calls
status.ForComponent(status.FLPInformers) and uses SetReady()/SetUnused() the
value is preserved and emitted in status.components.informers; ensure the CRD
description list is updated to include informers as well.

---

Outside diff comments:
In `@internal/controller/flp/flp_common_objects.go`:
- Around line 160-207: The k8scache ContainerPort and the "--k8scache.*" args
are being added unconditionally; change the logic to only append the k8scache
port (the corev1.ContainerPort with Name "k8scache") and the
"--k8scache.port/--k8scache.address" entries in the container.Args for the
constants.FLPName container when the informers feature is enabled (i.e., gate on
the informers.enabled flag in advancedConfig or the equivalent field). Also
ensure any call to validatePortConflicts is only invoked when informers are
enabled so we don’t validate 9090 unless the informers are on. Locate these
changes around where ports are appended (k8scache port block), where
container.Args are built (k8scache args), and where validatePortConflicts is
called and wrap them in the same informers-enabled conditional.

---

Duplicate comments:
In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml`:
- Around line 5150-5157: The replicas schema for the flp-informers deployment
currently declares minimum: 1 but the description mandates "a minimum of 2
replicas when `enabled` is `true` — fix this by either updating the schema to
enforce the constraint or rewording the description: either change the
`replicas` validation so minimum: 2 and/or add an if/then conditional that
checks the `enabled` boolean and requires replicas >= 2 when `enabled: true`, or
keep minimum: 1 but rephrase the description to a recommendation (e.g.,
"recommended minimum of 2 when `enabled` is true"); locate the `replicas` field
and the related `enabled` flag in the flowcollectors CRD to apply the chosen
fix.

---

Nitpick comments:
In `@bundle/manifests/netobserv-operator.clusterserviceversion.yaml`:
- Around line 469-486: The CSV UI descriptors should mirror existing patterns:
change the descriptor for processor.informers.enabled to use booleanSwitch (like
consolePlugin.enable) and mark the processor.informers.advanced section and its
child fields (processor.informers.advanced.batchSize,
processor.informers.advanced.processorPort,
processor.informers.advanced.resyncInterval,
processor.informers.advanced.sendTimeout,
processor.informers.advanced.updateBufferSize) as advanced/hidden; add a
fieldDependency on processor.informers.enabled:true so the advanced fields only
appear when the Informers toggle is enabled and the OpenShift form behavior
matches other components.

In `@internal/controller/flp/flp_common_objects.go`:
- Around line 98-124: The validatePortConflicts function repeats the same error
template for multiple ports; refactor it by extracting the error message
template and building a small map/slice of (portName, portValue) entries using
advancedConfig.Port, advancedConfig.HealthPort,
desired.Processor.GetMetricsPort(), and advancedConfig.ProfilePort, then iterate
and return fmt.Errorf(fmtTemplate, val, k8scachePort) when a conflict is found;
update references to the unique symbols validatePortConflicts, advancedConfig,
desired.Processor.GetMetricsPort, and k8scachePort to implement the loop-based
check and remove duplicated branches — also consider moving this spec validation
into the webhook near validateFLP so misconfigs are rejected early instead of
being swallowed by builders.
- Line 161: Promote the literal "k8scache" to a named constant by adding
k8scachePortName = "k8scache" alongside the existing k8scachePort constant, and
replace the literal Name: "k8scache" with Name: k8scachePortName (or use the
constant wherever that port name is referenced). Update references to the string
in the same file so usage is consistent with
healthPortName/prometheusPortName/profilePortName and the existing k8scachePort
symbol.
- Line 30: Replace the hardcoded k8scachePort = 9090 with a configurable
Advanced option: add a field (e.g., Advanced.K8sCachePort int) to the existing
configuration struct and supply a default of 9090 when loading/constructing the
config, then replace uses of the package-level k8scachePort symbol with reads
from that config (falling back to 9090 if unset); also update any config
parsing/CLI flags and documentation to expose the new Advanced option and keep
existing validation (or simplify validatePortConflicts if desired).

In `@internal/controller/flp/flp_controller_test.go`:
- Around line 168-174: Add assertions in the "With Kafka" and "Back without
Kafka" test contexts to verify the flp-informers Deployment and ServiceAccount
persist across the mono→Kafka→mono transitions: use Eventually with
k8sClient.Get and the existing flpKeyInformer key to ensure Get(ctx,
flpKeyInformer, &appsv1.Deployment{}) and Get(ctx, flpKeyInformer,
&corev1.ServiceAccount{}) both succeed (similar to the existing
ClusterRoleBinding check using rb3), so the shared informer Deployment/SA is not
accidentally torn down when switching models.

In `@internal/controller/flp/flp_informer_reconciler.go`:
- Around line 66-70: Replace the manual nil-check that initializes the local
variable enabled from desired.Spec.Processor.Informers.Enabled with a single
ptr.Deref call: set enabled :=
ptr.Deref(desired.Spec.Processor.Informers.Enabled, true) (or the equivalent
expression) so the default true is used when the pointer is nil; add the import
"k8s.io/utils/ptr". Update the block that references
desired.Spec.Processor.Informers.Enabled and remove the explicit if nil { ... }
branch.
- Around line 101-108: reconcileServiceAccount in informerReconciler currently
only creates the ServiceAccount via builder.serviceAccount() when
r.Managed.Exists(r.serviceAccount) is false, so
label/annotation/imagePullSecrets drift won't be corrected; update
reconcileServiceAccount to fetch the existing SA, compare labels, annotations
and imagePullSecrets with builder.serviceAccount() and perform a patch/update
(using r.Update or a strategic merge/patch method) when differences exist while
preserving managed fields, or explicitly document/guard that this resource is
intentionally immutable; reference reconcileServiceAccount, informerReconciler,
builder.serviceAccount(), r.Managed.Exists and r.CreateOwned when making the
change.

In `@internal/controller/flp/flp_test.go`:
- Around line 814-850: TestPortConflictValidation only exercises
validatePortConflicts via newMonolithBuilder; add a mirror set of assertions
that call newTransfoBuilder to ensure validatePortConflicts is enforced there
too (or refactor TestPortConflictValidation into a table-driven loop that runs
each test case against both newMonolithBuilder and newTransfoBuilder). Keep the
same mutation functions and error substring checks (use the existing cases for
Processor.Advanced.Port, HealthPort, Metrics.Server.Port, ProfilePort and the
no-conflict case) and assert the same error messages when invoking
newTransfoBuilder (or both builders) so the coverage is identical for
validatePortConflicts, referencing the existing TestPortConflictValidation,
newMonolithBuilder, newTransfoBuilder and validatePortConflicts symbols.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 46903d42-c9f4-4324-9404-64e76dab93f5

📥 Commits

Reviewing files that changed from the base of the PR and between 326ff55 and 9f0160e.

⛔ Files ignored due to path filters (1)
  • api/flowcollector/v1beta2/zz_generated.deepcopy.go is excluded by !**/zz_generated.deepcopy.go
📒 Files selected for processing (26)
  • api/flowcollector/v1beta2/flowcollector_types.go
  • api/flowcollector/v1beta2/flowcollector_validation_webhook.go
  • bundle/manifests/flows.netobserv.io_flowcollectors.yaml
  • bundle/manifests/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • bundle/manifests/netobserv-operator.clusterserviceversion.yaml
  • config/crd/bases/flows.netobserv.io_flowcollectors.yaml
  • config/rbac/component_roles.yaml
  • config/rbac/role.yaml
  • config/samples/flows_v1beta2_flowcollector.yaml
  • docs/FlowCollector.md
  • helm/crds/flows.netobserv.io_flowcollectors.yaml
  • helm/templates/clusterrole.yaml
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrolebinding.yaml
  • internal/controller/flp/flp_common_objects.go
  • internal/controller/flp/flp_controller.go
  • internal/controller/flp/flp_controller_test.go
  • internal/controller/flp/flp_informer_builder.go
  • internal/controller/flp/flp_informer_reconciler.go
  • internal/controller/flp/flp_monolith_objects.go
  • internal/controller/flp/flp_monolith_reconciler.go
  • internal/controller/flp/flp_test.go
  • internal/controller/flp/flp_transfo_objects.go
  • internal/controller/flp/flp_transfo_reconciler.go
  • internal/pkg/manager/manager.go
  • internal/pkg/manager/status/status_manager.go
💤 Files with no reviewable changes (2)
  • internal/controller/flp/flp_monolith_reconciler.go
  • internal/controller/flp/flp_transfo_reconciler.go
✅ Files skipped from review due to trivial changes (11)
  • internal/pkg/manager/manager.go
  • internal/controller/flp/flp_transfo_objects.go
  • bundle/manifests/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • internal/pkg/manager/status/status_manager.go
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • config/samples/flows_v1beta2_flowcollector.yaml
  • helm/templates/netobserv-informers_rbac.authorization.k8s.io_v1_clusterrolebinding.yaml
  • helm/templates/clusterrole.yaml
  • internal/controller/flp/flp_informer_builder.go
  • internal/controller/flp/flp_controller.go
  • docs/FlowCollector.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • config/rbac/role.yaml
  • internal/controller/flp/flp_monolith_objects.go
  • config/rbac/component_roles.yaml
  • api/flowcollector/v1beta2/flowcollector_types.go
  • bundle/manifests/flows.netobserv.io_flowcollectors.yaml

Comment on lines +460 to +481
func (v *validator) validateInformers() {
if v.fc.Processor.Informers == nil {
return
}

// Check if enabled
enabled := v.fc.Processor.Informers.Enabled != nil && *v.fc.Processor.Informers.Enabled

if enabled {
// When enabled, replicas must be at least 2 for high availability
replicas := int32(2) // default
if v.fc.Processor.Informers.Replicas != nil {
replicas = *v.fc.Processor.Informers.Replicas
}
if replicas < 2 {
v.errors = append(
v.errors,
fmt.Errorf("spec.processor.informers.replicas must be at least 2 when informers are enabled (got %d). Centralized informers require high availability to avoid losing the entire flow collection pipeline in case of failure", replicas),
)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default-enabled nil check may skip validation.

Enabled has kubebuilder default true, but this code treats nil as disabled. If the webhook runs before CRD defaulting (or on an object that bypassed defaulting), a user with informers: {} sneaks past the replica check while effectively running enabled. Consider treating nil as true to match the declared default:

🔧 Suggested tweak
-	// Check if enabled
-	enabled := v.fc.Processor.Informers.Enabled != nil && *v.fc.Processor.Informers.Enabled
+	// Check if enabled (defaults to true per CRD)
+	enabled := v.fc.Processor.Informers.Enabled == nil || *v.fc.Processor.Informers.Enabled

Also note: the CRD declares +kubebuilder:validation:Minimum=1 on Replicas, but this webhook enforces >=2 when enabled. Worth aligning the docstring on Replicas or the CRD marker so users aren't surprised by a "1 is valid here, but not really" situation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/flowcollector/v1beta2/flowcollector_validation_webhook.go` around lines
460 - 481, In validateInformers(), treat a nil Enabled pointer as true (respect
the kubebuilder default) by changing the enabled calculation to consider nil as
enabled instead of disabled (refer to v.fc.Processor.Informers.Enabled and the
validateInformers function); keep the replicas >=2 enforcement when enabled but
update the error message or coordinate with the CRD (Replicas'
+kubebuilder:validation:Minimum) so documentation/CRD marker and the webhook
behavior are aligned (either relax webhook to match CRD or change CRD/marker to
indicate minimum 2).

Comment thread helm/crds/flows.netobserv.io_flowcollectors.yaml
Comment on lines +5154 to +5161
replicas:
default: 2
description: |-
`replicas` defines the number of replicas for the flp-informers deployment.
For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
format: int32
minimum: 1
type: integer
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The HA requirement is stronger than the schema.

Line 5158 says 2 replicas are required, but the schema still allows replicas: 1. Either enforce that with validation or reword this to “recommended for HA”.

Suggested doc-only fix
                         replicas:
                           default: 2
                           description: |-
                             `replicas` defines the number of replicas for the flp-informers deployment.
-                            For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
+                            Set this to `2` or more for high availability.
                           format: int32
                           minimum: 1
                           type: integer
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
replicas:
default: 2
description: |-
`replicas` defines the number of replicas for the flp-informers deployment.
For high availability, a minimum of 2 replicas is required when `enabled` is `true`.
format: int32
minimum: 1
type: integer
replicas:
default: 2
description: |-
`replicas` defines the number of replicas for the flp-informers deployment.
Set this to `2` or more for high availability.
format: int32
minimum: 1
type: integer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml` around lines 5154 - 5161,
The description for the replicas field claims "a minimum of 2 replicas is
required" but the schema still allows minimum: 1; update the CRD by either
changing the schema to enforce minimum: 2 (replace "minimum: 1" with "minimum:
2" for the replicas property) or, if you don't want to enforce it, reword the
description to say "recommended for HA" instead of "required"; locate the
replicas block (contains "replicas:", "default: 2", and the description) and
apply the chosen change so description and validation are consistent.

@Amoghrd
Copy link
Copy Markdown
Member

Amoghrd commented May 13, 2026

/ok-to-test

@openshift-ci openshift-ci Bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 13, 2026
@github-actions
Copy link
Copy Markdown

New images:

quay.io/netobserv/network-observability-operator:8ac22a6
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-8ac22a6
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-8ac22a6

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:8ac22a6 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-8ac22a6

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-8ac22a6
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions Bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 13, 2026
@Amoghrd
Copy link
Copy Markdown
Member

Amoghrd commented May 13, 2026

/ok-to-test

@openshift-ci openshift-ci Bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 13, 2026
@github-actions
Copy link
Copy Markdown

New images:

quay.io/netobserv/network-observability-operator:e870c57
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-e870c57
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-e870c57

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:e870c57 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-e870c57

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-e870c57
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions Bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 14, 2026
@memodi
Copy link
Copy Markdown
Member

memodi commented May 14, 2026

/ok-to-test

@openshift-ci openshift-ci Bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 14, 2026
@memodi
Copy link
Copy Markdown
Member

memodi commented May 14, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown

New images:

quay.io/netobserv/network-observability-operator:b21c226
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-b21c226
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-b21c226

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:b21c226 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-b21c226

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-b21c226
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions Bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 14, 2026
@leandroberetta leandroberetta added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 14, 2026
@github-actions
Copy link
Copy Markdown

New images:

quay.io/netobserv/network-observability-operator:0e77cea
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-0e77cea
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-0e77cea

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:0e77cea make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-0e77cea

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-0e77cea
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 14, 2026

@leandroberetta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator bd79b6c link false /test e2e-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants